Weight Derivative

There are several methods to compute the weights of an ANN, however, some of these methods require the partial derivative of the mse (mean squared error) with respect each of the weights in the network. The formulas presented below illustrates how to compute the partial derivatives for the mse for a single training case; to compute the partial derivates for the whole training set the average of the partial derivatives for each case is used.

Output Layer Weights

In order to compute the derivative of the mse for each weight in the output layer, we need to perform two steps:

For each neuron in the output layer compute its δ
For each weight in the output layer compute the respective partial derivative

The figure below illustrates how to compute these partial derivatives when the activation function is z = tanh(1.5y) or z = logsig(y). Observe that t_i is the target for neuron i. Observe also the factor two in the computation of δ, as the derivative is used to indicate direction only, it is possible to remove this factor in the computation of δ.

Hidden Layer Weights

In order to compute the derivative of the mse for each weight in the hidden layer, we need to perform two steps:

For each neuron in the hidden layer compute its δ (using the δ s of the next layer)
For each weight in the hidden layer compute the respective partial derivative

The figure below illustrates how to compute these partial derivatives when the activation function is z = tanh(1.5y) or z = logsig(y)..

Weight Derivative